Search CORE

208 research outputs found

A Genealogical Interpretation of Principal Components Analysis

Author: AG Fix
AL Price
D Reich
G Barbujani
GA McVean
Gil McVean
HM Wilkinson-Herbots
J Baik
J Novembre
J Novembre
L Chikhi
LL Cavalli-Sforza
M Currat
M Slatkin
Molly Przeworski
N Patterson
P Debashis
S Klopfstein
S Schaffner
Publication venue: Public Library of Science
Publication date: 01/01/2009
Field of study

Principal components analysis, PCA, is a statistical method commonly used in population genetics to identify structure in the distribution of genetic variation across geographical location and ethnic background. However, while the method is often used to inform about historical demographic processes, little is known about the relationship between fundamental demographic parameters and the projection of samples onto the primary axes. Here I show that for SNP data the projection of samples onto the principal components can be obtained directly from considering the average coalescent times between pairs of haploid genomes. The result provides a framework for interpreting PCA projections in terms of underlying processes, including migration, geographical isolation, and admixture. I also demonstrate a link between PCA and Wright's fst and show that SNP ascertainment has a largely simple and predictable effect on the projection of samples. Using examples from human genetics, I discuss the application of these results to empirical data and the implications for inference

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

PubMed Central

Oxford University Research Archive

How Many Subpopulations is Too Many? Exponential Lower Bounds for Inferring Population Histories

Author: A Bhaskar
A Bhaskar
A Drummond
EJ Candès
FL Nazarov
GA McVean
H Li
J Heled
J Kim
J Terhorst
J Terhorst
L Excoffier
M Kimura
M Nordborg
P Turán
R Nielsen
RA Blythe
S Myers
S Schiffels
S Sheehan
TA Joseph
W Gautschi
Y Hua
Publication venue
Publication date: 08/05/2019
Field of study

Reconstruction of population histories is a central problem in population genetics. Existing coalescent-based methods, like the seminal work of Li and Durbin (Nature, 2011), attempt to solve this problem using sequence data but have no rigorous guarantees. Determining the amount of data needed to correctly reconstruct population histories is a major challenge. Using a variety of tools from information theory, the theory of extremal polynomials, and approximation theory, we prove new sharp information-theoretic lower bounds on the problem of reconstructing population structure -- the history of multiple subpopulations that merge, split and change sizes over time. Our lower bounds are exponential in the number of subpopulations, even when reconstructing recent histories. We demonstrate the sharpness of our lower bounds by providing algorithms for distinguishing and learning population histories with matching dependence on the number of subpopulations. Along the way and of independent interest, we essentially determine the optimal number of samples needed to learn an exponential mixture distribution information-theoretically, proving the upper bound by analyzing natural (and efficient) algorithms for this problem.Comment: 38 pages, Appeared in RECOMB 201

arXiv.org e-Print Archive

DSpace@MIT

Crossref

Genomic signatures of population decline in the malaria mosquito Anopheles gambiae

Author: A De
AD Stump
Austin Burt
CG Nevill
Charles Mbogo
D Bachtrog
EA Okiro
F Tajima
F Tajima
Franklin Mosha
G McVean
GA Watterson
H Li
H Li
H Li
Janet Midega
JH Gillespie
JK Pritchard
JM Mwangangi
K Tamura
K Thornton
KA Lindblade
M Kirin
M Pombi
MN Bayoh
NJ Govella
P Andolfatto
PC Kipyab
PP Chaki
PR Haddrill
RF Daniels
RN Gutenkunst
RR Hudson
S Schiffels
Samantha M. O’Loughlin
SC Nkhoma
SM O’Loughlin
Stephen M. Magesa
TL Russell
WG Hill
Publication venue: BioMed Central
Publication date: 24/03/2016
Field of study

Population genomic features such as nucleotide diversity and linkage disequilibrium are expected to be strongly shaped by changes in population size, and might therefore be useful for monitoring the success of a control campaign. In the Kilifi district of Kenya, there has been a marked decline in the abundance of the malaria vector Anopheles gambiae subsequent to the rollout of insecticide-treated bed nets. To investigate whether this decline left a detectable population genomic signature, simulations were performed to compare the effect of population crashes on nucleotide diversity, Tajima's D, and linkage disequilibrium (as measured by the population recombination parameter ρ). Linkage disequilibrium and ρ were estimated for An. gambiae from Kilifi, and compared them to values for Anopheles arabiensis and Anopheles merus at the same location, and for An. gambiae in a location 200 km from Kilifi. In the first simulations ρ changed more rapidly after a population crash than the other statistics, and therefore is a more sensitive indicator of recent population decline. In the empirical data, linkage disequilibrium extends 100-1000 times further, and ρ is 100-1000 times smaller, for the Kilifi population of An. gambiae than for any of the other populations. There were also significant runs of homozygosity in many of the individual An. gambiae mosquitoes from Kilifi. These results support the hypothesis that the recent decline in An. gambiae was driven by the rollout of bed nets. Measuring population genomic parameters in a small sample of individuals before, during and after vector or pest control may be a valuable method of tracking the effectiveness of interventions

Crossref

Springer - Publisher Connector

ZENODO

PubMed Central

Spiral - Imperial College Digital Repository

NEUROSURGERY ENTHUSIASTIC WOMEN SOCIETY

Recombination rate and selection strength in HIV intra-patient evolution

Author: A Eyre-Walker
A Jung
AE Jetzt
AR Templeton
AS Perelson
B Asquith
C Charpentier
C Kuiken
C Neuhauser
Christophe Fraser
CL Althaus
CTT Edwards
D Shriner
D Shriner
DJ Wilson
DN Levy
E Jones
E Simon-Loriere
G McVean
GA Bazykin
HY Lee
IM Rouzine
IM Rouzine
J Archer
J Chen
J Hunter
J Zhuang
JH Gillespie
L Chen
M Kimura
N Barton
N Barton
R Nielsen
R Shankarappa
RA Kaslow
RA Neher
RC Edgar
RC Griffiths
Richard A. Neher
RR Hudson
S Duffy
SA Seibert
SL Liu
T Leitner
T Nora
T Oliphant
Thomas Leitner
WJ Ewens
Y Yamaguchi
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 01/01/2009
Field of study

The evolutionary dynamics of HIV during the chronic phase of infection is driven by the host immune response and by selective pressures exerted through drug treatment. To understand and model the evolution of HIV quantitatively, the parameters governing genetic diversification and the strength of selection need to be known. While mutation rates can be measured in single replication cycles, the relevant effective recombination rate depends on the probability of coinfection of a cell with more than one virus and can only be inferred from population data. However, most population genetic estimators for recombination rates assume absence of selection and are hence of limited applicability to HIV, since positive and purifying selection are important in HIV evolution. Here, we estimate the rate of recombination and the distribution of selection coefficients from time-resolved sequence data tracking the evolution of HIV within single patients. By examining temporal changes in the genetic composition of the population, we estimate the effective recombination to be r=1.4e-5 recombinations per site and generation. Furthermore, we provide evidence that selection coefficients of at least 15% of the observed non-synonymous polymorphisms exceed 0.8% per generation. These results provide a basis for a more detailed understanding of the evolution of HIV. A particularly interesting case is evolution in response to drug treatment, where recombination can facilitate the rapid acquisition of multiple resistance mutations. With the methods developed here, more precise and more detailed studies will be possible, as soon as data with higher time resolution and greater sample sizes is available.Comment: to appear in PLoS Computational Biolog

arXiv.org e-Print Archive

Public Library of Science (PLOS)

CiteSeerX

Crossref

Directory of Open Access Journals

edoc

PubMed Central

Sexual selection protects against extinction

Reproduction through sex carries substantial costs, mainly because only half of sexual adults produce offspring. It has been theorised that these costs could be countered if sex allows sexual selection to clear the universal fitness constraint of mutation load. Under sexual selection, competition between (usually) males, and mate choice by (usually) females create important intraspecific filters for reproductive success, so that only a subset of males gains paternity. If reproductive success under sexual selection is dependent on individual condition, which depends on mutation load, then sexually selected filtering through ‘genic capture’ could offset the costs of sex because it provides genetic benefits to populations. Here, we test this theory experimentally by comparing whether populations with histories of strong versus weak sexual selection purge mutation load and resist extinction differently. After evolving replicate populations of the flour beetle Tribolium castaneum for ~7 years under conditions that differed solely in the strengths of sexual selection, we revealed mutation load using inbreeding. Lineages from populations that had previously experienced strong sexual selection were resilient to extinction and maintained fitness under inbreeding, with some families continuing to survive after 20 generations of sib × sib mating. By contrast, lineages derived from populations that experienced weak or non-existent sexual selection showed rapid fitness declines under inbreeding, and all were extinct after generation 10. Multiple mutations across the genome with individually small effects can be difficult to clear, yet sum to a significant fitness load; our findings reveal that sexual selection reduces this load, improving population viability in the face of genetic stress

Crossref

ZENODO

Dryad Digital Repository (Duke University)

NEUROSURGERY ENTHUSIASTIC WOMEN SOCIETY

Electronic Archiving System

Digital.CSIC

Jagiellonian Univeristy Repository

University of East Anglia digital repository

Forward-time simulation of realistic samples for genome-wide association studies

Author: A Carvajal-Rodriguez
A Carvajal-Rodriguez
AL Price
B Devlin
B Peng
B Peng
B Peng
B Peng
B Weir
BF Voight
Bo Peng
BW Lambert
C Li
C Pfaff
CC Spencer
CC Spencer
CC Wu
Christopher I Amos
CI Amos
CI Amos
CJ Hoggart
D Altshuler
D Li
D Reich
E Lander
FA Wright
G Ayodo
G McVean
GA McVean
GAT McVean
GK Chen
H Tang
HS Chai
HY Tan
J Marchini
J Wise
JC Barrett
JC Long
JD Wall
JK Pritchard
JK Pritchard
L Liang
M Chadeau-Hyam
M Kimura
M Li
M Slatkin
M Slatkin
MI McCarthy
MW Smith
P Marjoram
PC Sham
RR Hudson
S Myers
S Wiltshire
S Zollner
T Mailund
T Mehta
TH Consortia
W Knowler
WJ Ewens
X Zhu
Y Wang
Z Bochdanovits
Publication venue: BioMed Central
Publication date: 01/09/2010
Field of study

Abstract Background Forward-time simulations have unique advantages in power and flexibility for the simulation of genetic samples of complex human diseases because they can closely mimic the evolution of human populations carrying these diseases. However, a number of methodological and computational constraints have prevented the power of this simulation method from being fully explored in existing forward-time simulation methods. Results Using a general-purpose forward-time population genetics simulation environment, we developed a forward-time simulation method that can be used to simulate realistic samples for genome-wide association studies. We examined the properties of this simulation method by comparing simulated samples with real data and demonstrated its wide applicability using four examples, including a simulation of case-control samples with a disease caused by multiple interacting genetic and environmental factors, a simulation of trio families affected by a disease-predisposing allele that had been subjected to either slow or rapid selective sweep, and a simulation of a structured population resulting from recent population admixture. Conclusions Our algorithm simulates populations that closely resemble the complex structure of the human genome, while allows the introduction of signals of natural selection. Because of its flexibility to generate different types of samples with arbitrary disease or quantitative trait models, this simulation method can simulate realistic samples to evaluate the performance of a wide variety of statistical gene mapping methods for genome-wide association studies.</p

Crossref

Directory of Open Access Journals

PubMed Central

Pervasive Hitchhiking at Coding and Regulatory Sites in Humans

Author: A Eyre-Walker
A Siepel
AFA Smit
AJ Berry
AJ Jeffreys
AR Boyko
B Charlesworth
B Charlesworth
BF Voight
CC Spencer
CD Bustamante
CD Bustamante
D Charlesworth
D Karolchik
DA Hinds
DA Hinds
DA Wheeler
DJ Begun
DJ Begun
Dmitri A. Petrov
F Tajima
G Bejerano
G Coop
G Sella
GA McVean
GA McVean
GA Watterson
Gil McVean
Guy Sella
H Innan
HE Hoekstra
I Gordo
I Hellmann
I Hellmann
I Hellmann
J Charlesworth
J Charlesworth
J Maynard Smith
J. Michael Macpherson
JA Shapiro
James J. Cai
JC Fay
JC Fay
JC Fay
JD Jensen
JD Thompson
JH Gillespie
JH McDonald
JJ Cai
JJ Cai
JJ Welch
JL Kelley
JL Kelley
JM Braverman
JM Macpherson
JM Macpherson
K Bullaughey
L Zhang
M Goodman
M Kimura
M Przeworski
M Przeworski
M Przeworski
MD Shapiro
MH Kohn
MW Nachman
N Ahituv
N Bierne
N Takahata
NG Smith
NL Kaplan
P Andolfatto
P Andolfatto
P Andolfatto
P Gajer
PC Sabeti
PF Colosimo
PJ Daborn
R Nielsen
R Nielsen
RH Waterston
RJ Kulathinal
S Myers
SA Sawyer
SA Tishkoff
SE Ptak
SH Williamson
SW Doniger
TJ Hubbard
W Winckler
Y Kim
YT Aminetzach
Z Yang
Publication venue: Public Library of Science
Publication date: 01/01/2009
Field of study

Much effort and interest have focused on assessing the importance of natural selection, particularly positive natural selection, in shaping the human genome. Although scans for positive selection have identified candidate loci that may be associated with positive selection in humans, such scans do not indicate whether adaptation is frequent in general in humans. Studies based on the reasoning of the MacDonald–Kreitman test, which, in principle, can be used to evaluate the extent of positive selection, suggested that adaptation is detectable in the human genome but that it is less common than in Drosophila or Escherichia coli. Both positive and purifying natural selection at functional sites should affect levels and patterns of polymorphism at linked nonfunctional sites. Here, we search for these effects by analyzing patterns of neutral polymorphism in humans in relation to the rates of recombination, functional density, and functional divergence with chimpanzees. We find that the levels of neutral polymorphism are lower in the regions of lower recombination and in the regions of higher functional density or divergence. These correlations persist after controlling for the variation in GC content, density of simple repeats, selective constraint, mutation rate, and depth of sequencing coverage. We argue that these results are most plausibly explained by the effects of natural selection at functional sites—either recurrent selective sweeps or background selection—on the levels of linked neutral polymorphism. Natural selection at both coding and regulatory sites appears to affect linked neutral polymorphism, reducing neutral polymorphism by 6% genome-wide and by 11% in the gene-rich half of the human genome. These findings suggest that the effects of natural selection at linked sites cannot be ignored in the study of neutral human polymorphism

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

PubMed Central

Chapman University Digital Commons

An Approximate Bayesian Estimator Suggests Strong, Recurrent Selective Sweeps in Drosophila

Author: A Eyre-Walker
B Harr
BP Lazzaro
D Bachtrog
DJ Begun
ET Cirulli
F Tajima
G Coop
G Coop
GA McVean
Gil McVean
H Li
JC Fay
JD Jensen
JD Jensen
Jeffrey D. Jensen
JH McDonald
JK Pritchard
JL Kelly
JM Braverman
JM Macpherson
JM Maynard Smith
Kevin R. Thornton
KL Simonsen
KR Thornton
KR Thornton
L Ometto
M Przeworski
M Przeworski
M Przeworski
MA Beaumont
N Bierne
NG Smith
NL Kaplan
P Andolfatto
P Andolfatto
P Haddrill
P Marjoram
Peter Andolfatto
R Durrett
RR Hudson
SA Sawyer
SA Sawyer
SI Wright
THE Wiehe
W Stephan
Y Kim
Y Kim
Y-X Fu
Publication venue: Public Library of Science
Publication date: 01/01/2008
Field of study

The recurrent fixation of newly arising, beneficial mutations in a species reduces levels of linked neutral variability. Models positing frequent weakly beneficial substitutions or, alternatively, rare, strongly selected substitutions predict similar average effects on linked neutral variability, if the product of the rate and strength of selection is held constant. We propose an approximate Bayesian (ABC) polymorphism-based estimator that can be used to distinguish between these models, and apply it to multi-locus data from Drosophila melanogaster. We investigate the extent to which inference about the strength of selection is sensitive to assumptions about the underlying distributions of the rates of substitution and recombination, the strength of selection, heterogeneity in mutation rate, as well as the population's demographic history. We show that assuming fixed values of selection parameters in estimation leads to overestimates of the strength of selection and underestimates of the rate. We estimate parameters for an African population of D. melanogaster (ŝ∼2E−03, ) and compare these to previous estimates. Finally, we show that surveying larger genomic regions is expected to lend much more discriminatory power to the approach. It will thus be of great interest to apply this method to emerging whole-genome polymorphism data sets in many taxa

CiteSeerX

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

PubMed Central

eScholarship - University of California

Discovery of Rare Variants via Sequencing: Implications for the Design of Complex Trait Association Studies

Author: AL Price
B Kerem
B Li
Bingshan Li
D Azzopardi
D Keen-Kim
David B. Allison
GA McVean
IP Gorlov
JC Cohen
JC Cohen
JK Pritchard
JK Pritchard
JM Van Liere
LR Brunham
LR Cardon
MC King
MI McCarthy
N Ahituv
N Siva
RF Service
RR Hudson
S Romeo
Suzanne M. Leal
TA Manolio
TL Slatter
W Bodmer
W Ji
Publication venue: Public Library of Science
Publication date: 01/05/2009
Field of study

There is strong evidence that rare variants are involved in complex disease etiology. The first step in implicating rare variants in disease etiology is their identification through sequencing in both randomly ascertained samples (e.g., the 1,000 Genomes Project) and samples ascertained according to disease status. We investigated to what extent rare variants will be observed across the genome and in candidate genes in randomly ascertained samples, the magnitude of variant enrichment in diseased individuals, and biases that can occur due to how variants are discovered. Although sequencing cases can enrich for casual variants, when a gene or genes are not involved in disease etiology, limiting variant discovery to cases can lead to association studies with dramatically inflated false positive rates

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

PubMed Central

Genetic Crossovers Are Predicted Accurately by the Computed Human Recombination Map

Author: A Auton
A Boulton
A Kong
A Lynn
AD Peters
AG Clark
AJ Jeffreys
AJ Jeffreys
AJ Jeffreys
AJ Jeffreys
AJ Jeffreys
AJ Jeffreys
AJ Webb
AL Price
BE Stranger
C Grey
CL Yauk
D Serre
DC Crawford
DF Conrad
DM Evans
DM Greenawalt
E Jorgenson
ED Parvanov
F Baudat
FA Reed
G Coop
G Coop
G Coop
G McVean
GA McVean
Gil McVean
I Tiemann-Boege
J Buard
J Buard
JD Storey
K Paigen
KA Frazer
L Kauppi
L Kauppi
L Kauppi
M Archetti
M Cullen
M Pineda-Krch
MI Jensen-Seaman
MP Stumpf
N Arnheim
N Wang
NG Smith
P Calabrese
P Fearnhead
PA Mieczkowski
Pavel P. Khil
PP Khil
PR Bois
R Hubert
R Neumann
R. Daniel Camerini-Otero
S Keeney
S Myers
S Myers
SE Ptak
SE Ptak
TD Petes
U Friberg
V Borde
VG Cheung
W Winckler
Publication venue: Public Library of Science
Publication date: 01/01/2010
Field of study

Hotspots of meiotic recombination can change rapidly over time. This instability and the reported high level of inter-individual variation in meiotic recombination puts in question the accuracy of the calculated hotspot map, which is based on the summation of past genetic crossovers. To estimate the accuracy of the computed recombination rate map, we have mapped genetic crossovers to a median resolution of 70 Kb in 10 CEPH pedigrees. We then compared the positions of crossovers with the hotspots computed from HapMap data and performed extensive computer simulations to compare the observed distributions of crossovers with the distributions expected from the calculated recombination rate maps. Here we show that a population-averaged hotspot map computed from linkage disequilibrium data predicts well present-day genetic crossovers. We find that computed hotspot maps accurately estimate both the strength and the position of meiotic hotspots. An in-depth examination of not-predicted crossovers shows that they are preferentially located in regions where hotspots are found in other populations. In summary, we find that by combining several computed population-specific maps we can capture the variation in individual hotspots to generate a hotspot map that can predict almost all present-day genetic crossovers

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

PubMed Central